Efficient Group K Nearest-Neighbor Spatial Query Processing in Apache Spark

نویسندگان

چکیده

Aiming at the problem of spatial query processing in distributed computing systems, design and implementation new algorithms is a current challenge. Apache Spark memory-based framework suitable for real-time batch processing. Spark-based systems allow users to work on in-memory data, without worrying about data distribution mechanism fault-tolerance. Given two datasets points (called Query Training), group K nearest-neighbor (GKNN) retrieves (K) Training with smallest sum distances every point Query. This has been actively studied centralized environments several performance improving techniques pruning heuristics have also proposed, while, algorithm Hadoop was recently proposed by our team. Since, general, exhibits lower than Spark, this paper, we present first GKNN compare it against one Hadoop. incorporates programming features facilities that are specific Spark. Moreover, improve applicable incorporated. The results an extensive set experiments real-world presented, demonstrating solution, its improvements, efficient clear winner comparison

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multiple k Nearest Neighbor Query Processing in Spatial Network Databases

This paper concerns the efficient processing of multiple k nearest neighbor queries in a road-network setting. The assumed setting covers a range of scenarios such as the one where a large population of mobile service users that are constrained to a road network issue nearest-neighbor queries for points of interest that are accessible via the road network. Given multiple k nearest neighbor quer...

متن کامل

On efficient mutual nearest neighbor query processing in spatial databases

This paper studies a new form of nearest neighbor queries in spatial databases, namely, mutual nearest neighbor (MNN) search. Given a set D of objects and a query object q, an MNN query returns from D, the set of objects that are among the k 1 (P1) nearest neighbors (NNs) of q; meanwhile, have q as one of their k 2 (P1) NNs. Although MNN queries are useful in many applications involving decisio...

متن کامل

SPARQL query processing with Apache Spark

The number and the size of linked open data graphs keep growing at a fast pace and confronts semantic RDF services with problems characterized as Big data. Distributed query processing is one of them and needs to be efficiently addressed with execution guaranteeing scalability, high availability and fault tolerance. RDF data management systems requiring these properties are rarely built from sc...

متن کامل

A Review of various k-Nearest Neighbor Query Processing Techniques

Identifying the queried object, from a large volume of given uncertain dataset, is a tedious task which involves time complexity and computational complexity. To solve these complexities, various research techniques were proposed. Among these, the simple, highly efficient and effective technique is, finding the K-Nearest Neighbor (kNN) algorithm. It is a technique which has applications in vari...

متن کامل

Efficient mutual nearest neighbor query processing for moving object trajectories

Given a set D of trajectories, a query object q, and a query time extent C, amutual (i.e., symmetric) nearest neighbor (MNN) query over trajectories finds from D, the set of trajectories that are among the k1 nearest neighbors (NNs) of q within C, and meanwhile, have q as one of their k2 NNs. This type of queries is useful inmany applications such as decisionmaking, data mining, and pattern rec...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: ISPRS international journal of geo-information

سال: 2021

ISSN: ['2220-9964']

DOI: https://doi.org/10.3390/ijgi10110763